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Abstract 


'  )A  simple  procedure  is  proposed  to  determine  a  sample  size  for  estimating 
the  mean  weight  of  items  in  a  problem  of  obtaining  a  batch  of  a  large  number  of 

items.  Suppose  it  is  desired  to  obtain  a  large  number  NJ  of  items  for  which 

' 

individual  counting  is'lmpractical,  but  one  can  demand  a  batch  to  weigh  at 

least  w  units  and  hope  that  the  number  of  items  in  the  batch  is  close  to  the 

V>-  / 

desired  number  N  .  If  the  items  have  mean  weight  0,  it  is  reasonable  to  have 

s  /  .  '  / 

w  equal  to  when  0  is  known.  When  0  is  unknown,  one  can  take  a  sample  of 

size  n,  not  bigger  than  N^,  estimate  0  by  a  good  estimator  ©n  and  set  w  equal 

A  ^ 

to  0^Ng.  The  proposed  procedure  determines  the  sample  size  to  be  the  integer 

"  i-M-  r 

closest  to  pCN  ,  where  C  is  a  function  of  the  cost  coefficients  if  the 

'  £  L.  ' 

t  A  > 

coefficient  of  variation  is  known.  It  is  shown  to  be  optimal  in  some  sense. 

r  t  ' 

If  0  is  unknown,  a  simple  sequential  procedure  is  proposed  for  which  the 
average  sample  number  is  shown  to  be  asymptotically  equal  to  the  optimal  fixed 

V 

sample  size.  When  the  weights  are  assumed  to  have  a  gamma  distribution  given  0 

and  0  has  a  prior  inverted  gamma  distribution,  the  optimal  sample  size  in  some 

sense  can  be  found  to  be  the  nonnegative  integer  closest  to^CN  +  prA(/c-l), 

s 

where  A  is  a  known  constant  given  in  the  prior  distribution.'. 


Key  Words: 

Optimal  sample  size;  total  weight;  mean  weight;  nonparametric;  sequential 
procedure;  Bayes  procedure. 


1.  Introduction. 


Suppose  it  is  desired  to  obtain  a  batch  of  a  large  number  N  of  items  and 

s 

it  is  impractical  to  count  them  individually.  However,  it  is  possible  to 

require  the  batch  to  have  at  least  a  certain  weight  w  and  hope  that  the  number 

N  of  items  in  the  batch  is  close  to  Ng.  The  problem  then  is  to  determine  w. 

If  the  weight  of  the  items  is  constant  and  is  equal  to  9  each,  then  w  =  9N  . 

s 

When  a  batch  with  weight  w  is  delivered,  it  will  contain  exactly  N  items. 

s 

However,  if  the  weights  of  the  items  follow  a  distribution  with  mean  9  and 
2 

nonzero  variance  a  ,  then  even  if  9  is  known,  the  reasonable  weight  w  =  9Ng 

will  not  yield  a  batch  of  exactly  N  items.  Instead,  the  number  N  of  items  is 

s 

determined  by 

N  =  inf  {k>l  :  x1  +...+  XR  >  w} ,  (1.1) 

where  the  X's  are  the  weights  of  the  items  and  the  actual  total  weight 
★ 

w  =  X^  +...+  X^j.  Even  in  this  case  of  known  mean  9,  N  so  determined  will 
incur  a  mean  squared  error  of 

E(N-Ns)2  (1.2) 

2 

which  is  not  zero  unless  c  =  0.  If  the  mean  weight  9  of  each  item  is  unknown, 
it  is  possible  to  take  a  sample  {X^,...,  XnJ  of  size  n  (not  bigger  than  Ng)  to 

have  a  good  estimate  9n  of  9  and  determine 

w  =  Xx  +...+  Xn  +  (Ng-n)9n.  (1.3) 

(The  case  of  known  9  corresponds  to  that  n=0,  and  9n  =  9.)  The  original 
problem  has  then  become  that  of  determining  (a)  a  sample  size  n,  (b)  a  good 
estimate  9n  of  9  and  (c)  the  total  weight  w  demanded.  Guttman  and  Menzefricke 


(1986)  have  invest .gated  this  problem  by  assuming  that  the  distribution  of  the 

2 

X's  given  0  is  normal  with  unknown  mean  0  and  known  variance  a  and  0  has  a 
known  prior  normal  distribution.  In  this  case,  they  have  attempted  to  choose  n 
so  that 

E(Ke(N-Ng)2  +  Ks  n)  (1.4) 

is  minimized,  where  K  and  K  are  the  known  cost  coefficients,  0  is  the 

e  s  n 

posterior  mean  and  w  is  X1+...+Xn  +  (Ng-n)0n.  They  have  applied  a  fundamental 

equation  in  renewal  theory  given  as  (2.1)  in  their  paper  to  compute  (1.4). 

However,  since  the  X's  are  assumed  to  have  a  normal  distribution,  (2.1)  would 

not  hold.  Although  it  is  well-known  (cf.  Feller  II,  p.372)  that  the  asymptotic 

distribution  of  N  given  w  is  normal  with  mean  w/0  and  variance  wc  /Q  as  Ng 

becomes  large,  it  is  not  clear  how  the  development  in  Guttman  and  Menzefricke 

(1986)  is  theoretically  justified.  Moreover,  even  if  the  development  has  been 

fully  justified,  the  determination  of  the  optimal  sample  size  n  has  not  been  so 

easy;  it  would  have  to  involve  some  quite  complicated  computation. 

In  this  note,  we  shall  develop  a  treatment  of  the  aforementioned  problem, 

using  the  asymptotic  renewal  theory.  As  a  result,  the  determination  of  a 

sample  size  will  be  very  easy  and  simple  arithmetic  under  various  scenarios. 

Specifically,  we  shall  make  the  following  standing  assumptions.  Let  X^,  X2,... 

be  independent  and  identically  distributed  random  variables  (representing  the 

individual  weights  of  the  items  and  not  necessarily  normal)  with  positive  mean 
2 

0  and  variance  a  .  For  each  nonnegative  integer  n,  let  w  be  defined  as  in 
(1.3)  and  N  be  defined  as  in  (1.1).  The  problem  is  to  choose  n  and  ©n  in  (1.3) 


so  that  (1.4)  is  minimized  in  some  sense.  We  shall  specify  in  what  sense  n  is 
chosen  optimally  in  each  of  the  following  cases  being  studied:  (a)  the  mean  9 
is  a  known  constant,  (b)  the  mean  0  is  an  unknown  constant,  (c)  the  mean  0  is 
an  unknown  random  variable  having  a  known  distribution  and  (d)  the  mean  0  is  an 
unknown  random  variable  having  an  unknown  distribution. 

2 .  The  mean  9  is  a  known  constant. 

In  this  section,  instead  of  assuming  normality  for  the  distribution  of  the 

X's,  we  shall  assume  that  the  X's  follow  a  nonparametric  distribution  and  we 

know  the  mean  0.  In  this  case,  the  obvious  choice  of  n  is  zero  and  the  natural 

choice  of  ©n  is  0.  Then  w  =  eNg,  and  N  becomes  the  smallest  integer  k  such 

2 

that  X.  +  ...+X,  is  at  least  9N  .  The  exact  computation  of  E(K  (N-N  )  +  K  n)  is 

X  K  S  G  S  5 

impossible.  However,  by  the  well-known  renewal  theory  (cf.  Chow  et  al  (1979)) 
as  N  becomes  large, 

E(Ke(N-Ns)2)  =  KeNs(c/0)2.  (2.1) 

We  shall  consider  the  right-hand-side  of  (2.1)  as  an  inherent  fixed  cost  which 
cannot  be  eliminated  even  though  we  may  know  9. 

3.  The  mean  9  is  an  unknown  constant. 

Again,  we  assume  the  X's  follow  a  nonparametric  distribution;  but  this  time 
even  the  mean  9  is  unknown.  Suppose  a  sample  {X^,...,X  }  of  fixed  sample  size 
n  is  taken.  A  reasonable  choice  of  is  (X^+. . ,+Xn)/n,  which  is  also  the 
nonparametric  maximum  likelihood  estimate.  Then 


Nn  =  N-n  =  inf  {k>l:  Xfi+1  +...+  Xn+R  >  ©n  (Ns-n)}. 


The  risk  function  is 


E(Ke<N-Nsr  +  Kgn) 

-  KeE<Nn  -  <Vn)  r»2  +  Ke(Ns  '  n>2  E‘“e-»2  +  Ks" 

G  G 

+  2Ke(Ns-n)E(Nn  -  ( Ng-n ) g2 )( -iL- ) 

0  2 

■  KeE(Nn  "  <Ns-n,8D)2  +  Ke  <Vn|2  ”2  +  Ksn 

0  0  —0 

+  2Ke(Ns-n)E(Nn-(Ns-n)g2)(-2-). 

By  the  renewal  theory,  as  N  becomes  large 

s 


(3.2) 


E(Nn  “  (Ns -n)r)2  =  (Ns-n )(c/Q)2, 


(3.3) 


e  0-0  n  -n 

(Ns-n)E(Nn  -  (Ns-n)§0)(_n__}  .  0  (-§  ). 

/n 


(3.4) 


Therefore  the  risk  function  is  asymptotically 

y  ( Ne“n )  ^  y  y 

KeNs(cx/er  +  Ke  (ff/er  +  (Ks  -  Ke  ( a/0;  )n  (3.5) 

2  2  Ns2 

=  -KeNs(a/0)Z  +  Ke(a/0)Z  +  Kgn. 

Since  the  inherent  fixed  cost  in  (3.5)  does  not  involve  the  sample  size  n,  we 
shall  consider  the  following  transformed  risk  function  (with  n  not  greater 


than  N  ) 
s 


2  Ns 

Rn=Ke("/9)  -B-+Ksn 


(3.6) 


(3.7) 


which  is  minimized  by  taking  n*  to  be  the  integer  closest  to 


and  the  transformed  risk  function  is  approximately 

2  5Ns  (<Ke  (3'3) 

Now  if  the  coefficient  of  variation  a/8  is  known,  then  (3.7)  is  computable,  and 

one  simply  takes  a  sample  of  size  n*  which  is  optimal  in  the  sense  of 

minimizing  the  transformed  risk  function  (3.6),  and  determines  an  additional 

weight  of  (N  -  n. ) (X.+. . .+X„  )/n. .  If  the  coefficient  of  variation  a/9  is 
s  *  i  n*  K 

also  unknown,  then  one  can  determine  the  sample  sequentially  by 


■  J  s  s  fV  Sn  „  1 

l  “  ■  uy  en  SJ 

where  m  is  a  positive  integer  greater  than  one  and 


Then 


8  =  (x,+. . .+X  )/n  and 
n  1  n 

o  i  n  ^9 

&n  "  5  {  (Xi  -  Gn}  ’ 


©n  (  a—  a  )  , 

t  =  inf  {n>m:  —  + - ■* - >  -r- } 

1  -  a  aa  -  Xn3 

n 


(3.9) 


(3.10) 

(3.11) 


(3.12) 


1  J< 

where  X  =  ~  (Ks/Ke)  2 •  That  is,  take  a  sample  of  size  x  and  determine  the 
s 

additional  weight  of  (N  -t)(X1  +...+  X^J/x.  By  the  strong  law  of  large 
numbers,  it  is  straightforward  to  show  that  as  Ng  becomes  large,  provided  the 
distribution  of  X  is  continuous  or  m  goes  to  infinity  as  N  but  at  a  slower 


that  is;  the  ratio  of  the  sample  size  t  over  the  optimal  sample  size  goes  to 
one  almost  surely.  The  following  theorem  on  the  average  sample  number  is  also 
true. 


Theorem.  Et  =  n*  as  Ng  goes  to  infinity. 

That  is  the  average  sample  size  of  the  sequential  procedure  is  equal  to  the 
optimal  sample  size  to  the  first  order.  The  precise  conditions  on  the  X's  and 
the  proof  will  be  given  in  the  Appendix,  as  the  proof  is  rather  technical. 


4 .  The  mean  9  is  an  unknown  random  variable  having  a  known  distribution 

In  this  section,  we  shall  assume  that  given  0,  the  X's  have  a  parametric 
density  and  the  coefficient  of  variation  p  =  a/0  is  known.  Specifically,  given 
0,  let  the  probability  density  of  the  X's  be 


f  ( x ;  0 )  = 


,  .  <x-l  -ocx/0 

Max)  e 


0“r(a) 


,  x  >  0 


(4.1) 


lo 


,  x  <  0 


where  a  =  1/p  ,  and  0  has  a  density 


n(0)  = 


bVb/9 


0a+1r(a) 


0  >  0 


(4.2) 


10  ,  0  <  0 

where  a  and  b  are  known  positive  constants.  Following  the  development  as  in 
Section  3,  we  have  an  asymptotic  risk  equal  to 


An  obvious  choice  of  ©n  is 


E  ( 5 1  X-i » •  •  •  i  X  ) 

6n  =  — T“i - 

E(i*|X1,...,X  ) 

e  n 

a(X2+»**+X^)  +  b 
na-+~a+I  ' 

and  the  expression  in  (4.3)  becomes 


Ns'n  2  1 

K  -- —  +  K  (N  -n)  ----- T  +  K  n, 
e  a  e  s  na+a+1  s 


(4.5) 


which  is  minimized  by  taking  n  to  be  the  nonnegative  integer  closest  to 


H *  -  ( 0 


(a+1)  (  [jS]  p3  -  p2 
s 


(4.6) 


And  the  total  weight  can  be  determined  by 


w  =  x,  +. .  .+X„  +  (N  -nj  e  . 
1  n*  s  *  n* 


(4.7) 


5.  The  mean  9  is  an  unknown  random  variable  having  an  unknown  distribution. 

We  shall  make  the  same  assumption  as  in  Section  4  except  that  a  and  b  in 
the  prior  distribution  are  unknown  and  that  p  may  not  be  known,  in  this  case, 
the  optimal  sample  size  (4.6)  can  not  be  computed  and  the  estimate  (4.4)  can 
not  be  evaluated.  If  it  is  known  that  a  is  small  (small  a  corresponds  to  large 
variance  of  the  prior  distribution),  then  (4.6)  is  quite  close  to 


(5.1) 


We  recommend  a  sample  size  of  an  integer  n  closest  to  (5.1)  and  estimate  9  by 
«  ( Xj^  +...+  Xn)/n  and  determine  the  weight  by 


Ng(X1+. . .+x  )/n. 


(5.2) 


If  p  is  also  unknown,  the  sequential  procedure  as  described  in  Section  3  is 


recommended.  That  is,  the  sample  size  t  is 

-r  =  inf  {n  >  m:  0n/&n  >  6)  Ng/n}, 

s' 

where  m  is  a  positive  integer  greater  than  one, 


(5.3) 


0  =  (X^+. . .+Xn)/n,  and 


(5.4) 


&n  "  ^  l  (Xi  -  9nr* 
Then  the  weight  determined  is 
w  =  N  (X.+. . .+X  )/x. 

5  X  A. 


(5.5) 


(5.6) 


If  a  is  not  known  to  be  small,  then  (5.1)  will  not  be  close  to  (4.6).  There  does 
not  seem  to  be  much  one  can  do  except  in  the  case  when  there  are  previous  data 
available.  In  this  case,  an  empirical  Bayes  approach  can  be  applied.  A  study  of 
such  an  approach  will  be  conducted  elsewhere. 


6 .  An  example. 

We  shall  present  a  table  of  optimal  sizes  for  different  values  of  p,  a,  b, 
K  ,  Kg  and  N  .  The  optimal  sample  size  in  the  Bayes  columns  is  computed  from 
(4.6),  that  is,  the  nonnegative  integer  closest  to 


0  ^ 

pN  +  (a+1) p  p  ~- 
s  K 

\  V  o 


(6.1) 


and  the  optimal  sample  size  for  the  nonparametric  column  is  computed  from  (3.7), 
the  integer  closest  to 


MW 


We  have  chosen  the  values  such  that  the  first  part  corresponds  quite  closely  to 
that  reported  by  Guttman  and  Menzefricke  (1986).  Notice  that  the  optimal  sample 
sizes  are  quite  unstable  in  the  Bayes  case  as  a  varies.  Of  course,  when  a  is 
large,  it  corresponds  to  small  variance  in  the  prior  distribution  which  amounts 
to  saying  that  the  mean  is  known  quite  precisely  and  thus  less  sample  size  needs 
to  be  taken.  The  nonparametric  column  also  gives  the  asymptotic  average  sample 
numbers  in  the  first  order  approximation  when  they  are  big. 


.  *  ft.  ft  A  ft  .  a  .  A'.  ■  a  k  ■  i  '  ^  Aft 


i  *  i,*  i.i  »t»  *«»  ■<»*»H‘«,|*l«‘i'AH  A  a'4 


Table  1.  Optimal  Sample  Sizes  (N_  =  20,000;  K 


Bayes 

a 

360000 

36000 

3600 

360 

36 

3250 

3325 

3332 

3333 

3333 

235 

323 

332 

333 

333 

0 

23 

32 

33 

33 

0 

0 

2 

3 

3 

12800  13280  13328  13333  13333 

0  1184  1318  1332  1333 

0  0  117  132  133 

0  0  0  12  13 


Appendix. 

Theorem;  Let  X,  X^,  X2,...  be  independent  and  identically  distributed  random 

2 

variables  with  positive  mean  9  and  finite  nonzero  variance  a  .  For  each  n>l, 
let 


-  s  <v-*V< 


i 

n 


11 

E  (X.  -  9  )  ,  and 
i=l  1  n 


(A.l) 
(A. 2) 


9_  i 

.  ,  ,  n  n  n  .  1  , 

t  =  inf  {  n>m:  —  +  — x - >  -r-r  } 

1  a  aa  -  Xn  ’ 

n 


(A. 3) 


where  m  is  a  positive  integer  bigger  than  one,  and  X  is  a  positive  constant. 
Assume  the  distribution  of  X  is  continuous.  If  E(X^)P  <  <=°  for  some  p  >  1,  then 

(i)  {(Xt)p,  0  <  X  <  1}  is  uniformly  integrable,  and 

(ii)  E|Xx  -  ||p  ->  0  as  X  ->  0. 


Proof:  Consider 
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For  any  small  t  >  0,  let 

a  =  inf  {n>l:  | 0^—0 | <e }  (A. 5) 

Then  since  E(X^)P  <  ®,  by  Lemma  3  in  Chow  et  al  (1983),  Ea^p  <  ®.  Make  copies  of 

a  and  denote  them  by  _ and  let  <*n  =  o'  ^  +...+  a*n^.  For  each  n>l, 

let 


1 1 


(A.6) 


Yn  =  <*  -  <Xan  1+1 
n-l 


0) 2 )  +...+  (a2  -  (X  -  0) 2 ) 

°n 


where  cxq  =  0.  Since  E(X2)P  <  00  and  E  a2^  <  ®,  by  Lemma  2 ( i i )  in  Chow  et  al 
(1983),  E | Y1 | p  <  ®.  Define 

P  -  inf  (n>l:  lYj.+  - •  -+Yn I (A. 7) 
Since  E | | p  <  ®,  by  Lemma  3  in  Chow  et  al  (1983),  E0P  <  ®.  Let  t  =  ag.  Since 


2d 


Ea  -  <  00  and  E(3f  <  ®,  by  Lemma  2 ( i i i )  in  Chow  et  al  (1983),  Et^  <  ®.  And  t  is  a 


stopping  time.  Make  copies  of  t  and  denote  by  t^,  t^,  ...  and  let  t  =  t^ 


.(n) 


+...+  t  .  Then  for  each  n  >  1, 


-2.2  2 
a^.  >  a  -  e-e  . 
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(A.8 ) 


Hence 
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(A. 9) 


By  Theorem  1  of  Yu  (1986),  {(Xx)P,  0  <  X<1 }  is  uniformly  integrable.  By  the 
strong  law  of  large  numbers 

(A. 10) 

as  X  goes  to  zero.  Hence  as  X  goes  to  zero 


Xt  -  |  -*  0,  almost  surely, 


E|Xx  -  f|p  0; 


(A. 11) 
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in  particular,  if  X  =  (K  /K  )  , 
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